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00 (57) Abstract: Provided is a computer implemented method of analyzing a macromolecule for potential binding sites, comprising: 
© positioning an instance of a computer representation of an organic fragment at a plurality of potential binding sites.of the macro- 
molecule; selecting a value of fi, wherein B = u'AT + /«</¥>, where p' is the excess chemical potential, k is Boltzmann's constant, 
T is the absolute temperature, and <N> is the mean number of molecules of the organic fragment; repositioning the instances of 
the organic fragment until a minimized energy state is obtained; assessing, for each instance of the repositioned organic fragment, 
^ whether the repositioned organic fragment binds to the macromolecule at the associated potential binding site at the selected value 
of B\ deleting instances of the organic fragment mat do not bind at the associated potential binding site at the selected value of B\ 
repeating these steps at a lesser value of B\ and outputting a list of undeleted instances of the organic fragment; provided that the 
organic fragment is not water. 



COMPUTATIOl^AI; PROTEIN PROBING TO IDENTIFY BINDING SITES 

BACKGROUND 0i^T^'B^^i^(M 

Field of the Ihvehtioil 

The present invention relates to methods of identifying binding sites on 
proteins, methods for identifying classes of compounds suitable for binding a 
protein* and methods of c*>nduet^ identify compounds that 

interact with a protein to affect a biological process. 

Related Art 

Determinations of protein structures have to date been conducted by 
isolating crystals of the protein of interest, arid analyzing structure by X-ray 
crystallography. Typically, the protein has been co-crystallized with a heavy 
metal component, or subjected to multiple co-crystallizatipns, with the heavy 
metal providing a reference for solving the crystallographic data. 

With a determination of the structure of a protein, or the structure of 
another miacromolecule having significant t^a^ structure, such as a DNA or 
RNA, workers often seek to identify the binding sites that are or may be of 
significance to a biological process, such as an enzyme active site or a site for 
interacting with another macromolecule or with itself Computational efforts 
have been focused on efforts to sample the surface of a molecule to find good 
fits with knbwn binding agents. These methods have had modest success, and 
are dependent on knowledge of (a) the structure of good binding agents and^ 
often, (b) the function of the protein- A more traditional approach has sought 
to co-crystallize binding substances with the macromolecule to identify 
binding sites. With the binding site identified, educated guesses can be made 
as to new molecules that could bind the site. These educated guesses can 
guide synthetic methods, including combinatorial chemistry methods, to make 
and test new molecules. When such prospective binding agents prove 
effective binding agents, and possibly are also found effective in an 



appropriate biological model, the structural correlations drawn from the results 
can be binding site to make still further 

inferences about the structure i^ to a biological function. This co- 
crystallization approiach depends on an initial knowledge of active agents, aind 
Is expenmentaUy difficult and time consuming. 

The present inventor has found a method of identifying, from a three- 
dimensional structural solution of a macromolecule, the binding sites for 
moleicules. Tffie st^ as the basis for the method can be 

derived from crystallogRiphy, spectroscopic analyses such as NMR, 
computational derivations, or any other method of determining the structure of 
a madromolecul^. The method does not require or typically use information 
on the fimctio^ as the method avoids subjective biases 

and ii&ead dbpeh^ on physical parameters. Further, the method can 

be reMed fur^ choices of binding sites and identify 

the functionalities, i.e., org^c fragments or that effectively interact 

with the binding site(s). The data obtained for ORFs further identifies the 
orientations of &e fiMcti^ useful in a candidate binding agent, thereby 
providing a tool for ^^arching chemical databases to identify candidate binding 
agents^ Where the methods described herein identify more than one potential 
binding site, tie data generated through these methods can be used to 
energetically rank the binding sites, and thereby quantitatively determine 
which site has the potential to more strongly bind molecules. 

The computational method described here generates maps of binding 
site preferences that are nearly identical with maps produced by compiling 
data generated by traditional methods, but with one important difference— the 
experimentally produced data took many years to produce while the data 
produced as described herein can be produced in no more than a few weeks. 
The invention provides an important development in unbiased simulation 
methods for predicting the character of agents that bind to biological 
macromolecules to affect the function of the macromolecules. 



-3- 



: SUMMARY OF THE INVENTION 

In one embodiment provided is a method of identifying binding sites 
on ^ (a) for at least one organic fragment (ORF), 

conducting, at separate values of parameter 2?, two or more simulated 
5 annealing of chemical potential calculations using the ORF as the inserted 

solvent; and (h) comparing converged solutions from step (a) to identify first 
loc^b^ Oi^ is strongly bound, thereby identifying 

candidate sites for bmding ligand molecules. In one preferred aspect, the 
method fi^ comprise?: (c) ideiitifynuig clusters of sites that strongly bind an 
10 ORF. :\ : :Idi-:\sa^fliei:> preferred: aspect^ the method further comprises:; 

(d) conducting steps (a) and (b) fo^ ORFs and idmtifying 
clusters Where two or iriore distinct ORFs bind. Preferably- a cluster that 
binds three or more distinct ORFs is identified. The method can identify 
further functionalities that contribute to the binding of bioactive agents by 

15 reducing tti^ of a cluster to further identify 

elements that would contribute to the binding of a bioactive agent. 

In another preferred aspect, the method further comprises: 

(e) conducting, at separate values a measure of chemical potential, two or 
more simulatdi annealing of chemical potential calculations using water as the 

20 inserted solvent; (f) coinparing converged solutions from step (c) to identify 

locations at which water is strongly bound, thereby identifying locations on 
the protein which are not candidate sites for binding ligand molecules; and 
(g) identifying first locations that are not water locations. 

In still another preferred aspect, the simulated annealing of chemical 

25 potential calculations comprise multiple steps of sampling, and wherein in a 

number of steps of the sampling the ORFs position is changed by a small 
amount and the resulting new position is accepted or rejected based on the 
change in energy as a result of the change attempted. 

Further provided is a method of identifying the chemical 

30 characteristics of compounds that bind a macromolecule comprising 



examining the functionalities and relative orientations of the ORFs found in a 
cluster pursuant to the binding site identifying method outlined above. 

Also provided is a method of conducting combinatorial chemistry to 
identify compounds that interact with a macromolecule comprising: 
(a) identifying classes of reactants that are modeled by the functionalities of 
the ORF^ fo site identifying metliod of 

macromolecule; (b) designing a combinatorial synthetic protocol that calls for 
two or more synthetic procedures that react reagents of at least two of the 
classes identified m (a); and (c) conducting the combinatorial synthetic 
protocol to create candidate binding molecules. 

Further provided is a method of conducting a bioactive agent disco very 
process cotnprisin group of established combinatorial synthetic 

protocols or collections of chemical compounds or pools of chemical 
compounds, identifying those members of the group that provide a high 
density of compounds that meet for a macromolecule selection criteria 
identified from the binding site identifying method; and (b) conducting 
binding or functional assays to identify compounds obtained from the 
identified collections br protocols which bind or affect the function of the 
macromolecule; 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1A illustrates a solved crystal structure, while Figure IB 
displays the structure with a grid imposed. 

Figures 2A-2D display the method of the invention applied to the 
crystallographic solution of elastase; the method can be exemplified using 
methanol as the ORF. 

Figures 3A and 3B show the combined results for several ORFs bound 
to elastase after simulations at relatively low B values, with the results in 
Figure 3B filtered to identify clusters of these bound ORFs. 



Figure 3C shows the two clusters of Figure 3B which remain after 
excluding strong water binding sites, and Figure 3D shows the one cluster that 
remains after extending the analysis to another ORF; Figure 3E shows the 
analysis extmded to stUl a further ORF. 

The panels of Figure 3F compare the simulation results to a co- 
crystallography result 

Illustrated in Figure 4A are the amide binding sites extracted from the 
data of six co-crystallization experiments with elastase and kripv^ 
and iUtistra^ cluster of the highest affinity annde binding 

sites determined by the simulaiibn method of the invention. 

filustrated in Figure 4C are the amide ORFs of Figure 4B plus aicddes 
which are in the vibinity of the cluster but which appear in the simulation at 
second; highest affinity binding values. 

In Figures 5 A and 5B, solutions obtained with co-crystals of elastase 
inhibitors are compart with data obtained by the methods herein de^aibed: 

Figures 6 A and 6B show the surfaces of elastase mVolve^ 
ligands as indicated by the crystallographic data, Figure 6A, and as indicated 
by the solutions obtained using the method described herein, Figure |6B . 

Figure 7 shows a schematic illustration of the type of titrations for 
water binding to a maerqmolecule that can be used to help identify a level of 
relatively strong water binding. 

DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

The following terms shall have, for the purposes of this application, the 
respective meanings set forth below. 

"Bioactive agent" refers to a substance such as a chemical that can act 
on a cell, virus, organ or organism, including but not limited to drugs (Le., 
pharmaceuticals) to create a change in the functioning of the cell, virus, organ 
or organism. In a preferred embodiment of the invention, the method of 
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identifying bioactive agents of the invention is appUed to organic molecules 
haying molecular weight of about 600 or less or to polymeric species such as 
peptides, proteins, nucleic acids, protwglycahs and the like. A bioactive agent 
can be a medicament, i.e. a substance us^ in therapy of an aiiimal, preferably 
5 ahuman. 

"Cluster of frw grid pomtsV refere ta fe a 
"cluster" in that, relative to a given ORF* there is a sufficient number of 
neartiy or ac^acent fi^ 

ORF coixld be inserted at the cluster; TXusj thb free grid pointe for 

10 H2O must be defined to iden^ or interior of a 

macromolecule that could ^cpmn^date I^O^^pu^ the selection criteria 
should . err to identifying some volinneS: : that do hot accommodate -H^O, as 
needed to assure that all appropriate volumes are sampled in the simulation 
process. A cluster of free grid points is defced differently depending on the 

15 size of the ORFs (e.g., compare H2Q and benzene) and the spacing of the grid. 

A "cluster of ORF binding sites" typically refers to a pa^ 
located or superimposed sites that bind ORFs with sufficient affimty merit 
further consideration. 

"Collection of chemical compounds" refers to any collection of 

20 compounds collected or organize with the intention that they can be 

examined to identify bioactive agents (e.g., having a biological activity 
measured directly or through a surrogate for biological activity such as 
binding to a macromolecule or 'interfering with a function of a 
macromolecule). The collection can be prepared from a collection of simpler 

25 molecules (which can be bound to a support) by a chemical scheme designed 

to generate a diversity of chemicals. Collections of this latter type are often 
referred to as "combinatorial libraries." 

"Free grid points" refers to grid points (which are discussed below) 
which are, for a given accepted definition of atomic radius, "free" in that they 

30 do not fall within the atomic radii of the mapped atoms of the relevant 

macromolecule. 



"Maaromolecule" refers to a molecule or collection of molecules which 
has a time-averaged tertiary structure. Thus, while the terra typically refers to 
proteins, ribonucleic acids- ^tuetures formed of bbth nucleic acid and protein, 
carbohydrates, structiires formed of two or more df the aforementioned, and 
the like, it can also refer to structures formed with other molecules including 
lipids. Macrbmol^ules are us in the method described herein with 
reference to maps of their terti^ structure. Such maps are typically 
generated by X-ray dif&actioh studies, which have generated m^s for 
thousands of macrombleeules. However, maps can be produced by other 
methods such as computational methods or computational methods 
supplemented by other dita such as NfvlR data. While computational methods 
have been difficult to apply, iecfcnt studies appear to have achieved some 
successes. 

"Organic fragments" or "ORFs" are molecules or molecular fragments 
that can be used to model ohe or more modes of interaction with a 
macromolecule, such as the interactions of carbdnyls, hydroxyls, amides, 
hydrocarbons, and the like. 

"Water locations" are locations at which water is strongly bound, 
meaning, in one embodiment, for example locations where the simulation 
indicates water remains bound when the simulation is iun at values of B that 
are equal to or less than the 2? value for the transition point indicating those 
water molecules that are strongly influenced by the macromolecule. 
Illustrated in Figure 7 is a conceptualization of the titration of simulated bound 
water molecules with decreasing values of B, a parameter described further 
below. A transition point indicates water molecules that are strongly 
influenced by the macromolecule. A B value less than or equal to that at the 
transition point can be designated as defining water binding of sufficient 
strength to render competitive binding by another molecule unlikely, as 
illustrated by point SB in the illustration. Typically, for a water soluble 
protein, this point SB is selected so that about 100 to about 50 water molecules 
remain bound for a 50 kd protein. 



T3ie simulation process of the present invention works by artificially 
inserting a given ORF at an unbiased sampling of all the sites on or within a 
macromolecule structure where such ORF can, as a practical matter, reside. 
These sites can be tenned ^ "sampling sites." Typically, a schedule of 
simulations for each of a number of OREs are run, with each simulation run at 
a separate v^ue of a parameter ^ wU^ is related to the excess chemical 
potential The schedule provides for simulations conducted at each of a 
number of J? values, ^ic^y rahgihg from 10 to about -15. In each 
simulation at W giveii ira^-<^By%6 simulation assesses at each step of the 
simiilation whether the insertion of the ORF at a gi ven site shall be accepted or 
rejected, with the assessment based on a grand canonical ensemble probability 
density function. At each step of the simulation, the algorithm models the 
insertion of the ORF at the site. A forced bias canonical probability density 
function is used to ^ the ORF in small steps (e.g., ±0.2A, 

±30°) to identify an ehergy^^m insertion given the simulation 

parameters in place at the time of the simulation step. The probability of the 
insertion is then deteimiried fiom the canonical ensemble probability 
density function^ arid the ORF can be represented as resident at the site by a 
random number generating protocol weighed to the probability value. 
Alternative methods for choosing to make this representation, such as 
applying cutoff values for when to represent the insertion or not, can also be 
applied, but are less favored. Typically, following a successful insertion, the 
subsequent deletion attempts at the site are with the previously identified 
translated and rotated ORF, and this translated and rotated ORF is used until a 
deletion attempt succeeds. The simulation is typically conducted for a large 
number of steps, such as 2 x 10 6 steps, with the majority of the steps, e.g., 
1.5 x 10 6 required to "equilibrate" the simulation so that the number of 
accepted insertions is equal to the number of deletions on average. 

By taking a large number of unbiased samplings at each sample site 
oyer the latter course of the simulation, such as after every 200 steps of 
iterations after equilibrium is achieved, an occupation probability of the ORF 
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residing at that sample site at the given value of B can be assessed. The 
occupancy as an overall result of the method can then be determined based on 
this probability, for example with a random number protocol making the 
representation based on its probability. The degree to which the ORF is 
5 translated or rotated can also be represented based on the probability of such 

translations and rotations. 

jFor each ORF, simulations are run at each of a number of values of a 
measure of excess chemical potential, such as B. Thus, as this value lowers, 
the retention of an ORF at a given sampling site is an indication of high 

10 relative binding 

The sampling sites are typically arrived at by creating a grid as 
illustrated in Figure 1. Figure 1 illustrates a solved crystal structure 
(Figure 1A) on which a grid is imposed (Figure IB). For example, the grid 
can have about ViA to about 1 A spacing, with the grid intersection points 

15 defining the candidates for sampling sites. The spacing of the grid is 

preferably selected to be less than the smaUest cross-section of the ORF. The 
spacing is typically selected to be small enough in relation to the size of the 
ORF so that the probability that free volumes that could define free grid point 
clusters have sufficient free grid points to allow useful sampling as described 

20 Below. Such relatively small spacing minimizes the chance that the selection 

of how to orient the grid will bias the algorithm against identifying certain 
ORF binding preferences. The sampling sites are selected from sites that are 
unoccupied by the macromolecule (Figure IB). A final elimination of "grid 
bias" is achieved by varying the test insertion points away from strict initial 

25 insertion at grid points, as described below. 

The sampling sites are limited to those sites having enough adjacent 
volume free of the macromolecule to allow the ORF to be inserted. For 
example, the sampling sites can be limited to grid points within an open area 
of at least about 2A x 2A x 2A (= 8A 3 or 0.008nm 3 or about 2.5A x 2.5A x 

30 2.5A (= 15.6A 3 or 0.0156nm 3 ) or, for water, about 2.2A x 2.2A x 2.8A. The 

grid points can be selected for those free grid points that are within a cluster of 
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free grid points, such as* for example, a cluster of 3, 4, 5, 6, 7, 8 or more free 
grid points, depending on the size of the OR^ 

In one preferred embodiment, the ORF is not necessarily initially 
inserted exactly at the grid points, but instead at a random sampling of 
5 insertion points within a short distance of the grid points, such as points within 

a sphere shape centered at the grid point and having a diameter of about some 
percentage, such as 10%, of the grid spacing, or within a box shape centered at 
the grid point having width, length and height of about such a percentage of 
die grid distance. As discussed above, this "wobble" in the initial insertion 

10 point helps eliminate grid bias where the placement of the grid happens to 

reduce the chance that a given open volume will be efficiently sampled. 

Using the crystallographic solution of elastase, in particular, the pig 
pancrease elastase structural solution of G.A. Petsko of Brandeis University, 
the method can be exemphfied using methanol as the ORF. Figure 2 A shows 

15 the final solution using a relatively high B value, e.g., B = 10. Figure 2B 

shows the final solution using an intermediate value, e.g., B = 6 or 7. Figure 
2C shows the final solution using a lower intermediate value, e.g., B = 0 or -2 
or -4. Figure 2D shows the final solution using a restrictive value, such as B = 
-14. As illustrated, with lower values of B less and less methanol molecules 

20 remain bound. These remaining methanol fragments indicate those that bind 

with relatively high affinity. 

The next step of the process is to conduct simulations with additional 
ORFs and identify clusters of relatively high affinity ORF binding sites. Thus, 
for example, again using elastase, simulations can be conducted to determine 

25 binding for ORFs for ammonia, methanol, ketone and amide. Combined 

results at relatively low B values are illustrated in Figure 3 A. Clusters of ORF 
binding sites are identified in Figure 3B. The method of the present invention 
seeks to identify clusters of ORF binding sites, where the clusters can be made 
up solely of one type of ORF. Preferably, however, the cluster will include 

30 binding sites for 2, 3, 4, 5, 6, 7 or more distinct ORFs. 
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Examples of useful ORFs include: 



Name 


Structure 


Acetone 


CEL s (C=0)Ctt 3 


Aldehyde 


H(CM3)-GH 3 


Amide 


H(C=0)NH 2 


Ammonia- 


" ■ • NH3 


Benzene 




<^ 
\^ 




Carboxylic Acid 


chjgooh 


1,4-Diazine 








Ester 


CH 3 -0-(C=0)-CH3 


Ether 


CH3-O-CH3 


Formaldehyde 


H 2 00 


Furan 




0 


Imidazole 




*=^\ 


Methane 


CH, 


Methanol 


CH3OH 


Phospho-Acid 


0 

HO P OH 

OH 



Name 


Structure 


Pyridine? 




-w 




Pyrimidine 


1 

t 






Pyrrole 




N 


Thiol 


ch 3 sh 


Thibphene 




^?ps 



Preferably, the ORFs selected are representative of chemical features that have 
proven useful in the design of pharmaceuticals or other bioactive chemicals. 

Thus, in a first mode of analysis, an ^ important part of the process is to 
run the simulations with several ORFs, identifying clusters of sites that bind 
multiple ORFs with relatively high affinity. These clusters are strong 
candidate sites for ligand binding sites. Moreover, the relative positioning of 
the ORFs is instructive of the features of good binding agents. For example, at 
the binding site identified on elastase by the methods described below, a 
cluster having two benzene rings with an amide interposed between them 
models some of the strongest elastase inhibitors derived from an extensive 
research program, which inhibitors have a sulfonamide in place of the carbon- 
based amide of the simulation. See, Tables XXm and XXV of Edwards et aL t 
Med. Res. Rev. 7*127-194 (1994). 

In some implementations of the invention, clusters of ORF binding 
sites alone will identify, or substantially narrow the range of choices for, the 



sites at which ligands interact with a given protein. However, in some 
embodiments of the invmtion, the sites tiiat bind wa^ are identified, 

and the clusters that mtersect with strong water binding sites are discounted. 
Thus, in the elastaise example, the candidate ligand binding sites of Figure 3B 
are narrowed by excluding Water binding sites, ^ illustrate in Figure 3C. If 
the analysis is extended to five ORFs as illukrated in Figure 3D, a single 
candidate site remains. Figure 3 E shows a sli^tly different perfective of the 
same site illustrated in Figure 3D, with the analysis extended to six ORFs. 
Figure 3F shows how well the candidate site (left pahel) matches up with the 
structure of a co-crystal containing the hg^^ 
isopropylanilide. 

Accordingly, in a second mode of analysis,, an optional step in the 
process is to narrow the choices for ligand bihdihg sites by excluding ORF 
clusters that intersect with relatively strong water binding sites. 

It should be noted that clusters of ORFs are typically identified at 
relatively low B values, ther^y helping to^ 

for ligands. However, further information about prospective binding sites can 
be gleaned by looking, in the vicinity of a prospective binding site, at more 
weakly binding ORFs. This information value flows from the prospect of 
more weakly binding ORFs modeling a ligand intemctibn which, while weak 
in isolation, models a real contribution to hgand binding affinity of a bioactive 
agent as a whole. Illustrated in Figure 4 A are the amide binding sites 
extracted from the data of six co-crystallization experiments with elastase and 
known hgands. Illustrated in Figure 4B is a cluster of the highest affinity 
amide binding sites determined by simulation. Illustrated in Figure 4C are the 
amide ORFs of Figure 4B plus amides which are in the vicinity of the cluster 
but which appear in the simulation at the second highest affinity values. As 
illustrated, this last step of expanding the results by looking at neighboring 
lower affinity ORF binding sites helps to better model the results seen in co- 
crystallography. Specifically, the cluster results identify the site at which the 
majority of amide, binding sites are seen in crystallography, but the expansion 
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extends the residts to another cleft in elastase where amides have been 
experimentally locate! AdditibnaUy, Ae expa^on identifies part of another 
cleft at Which ligand interactions are seen (as will be illustrated in other 
Figures), 

5 Thus, in a third mode of analysis, the features of ligand binding sites 

indicated by other modes of analysis are expanded upon by looking to less 
stringent simulation results in the vicinity of ORF clusters. The above 
illustration focused on a cluster of one type of ORF, but is applicable with 
clusters of many types of ORFs, where tiie expansibios can be limited to one 

10 type of ORF or multiple types of ORFs. 

The data in Figures 4A4G illustrate an important concept. Both in 
actual ligand bindings and in the simulations, multiple effective binding 
locations and orientations for a given tyjie of moiety can be found to overlap. 
This reflects the existence of mult^ In real world 

15 actions, rather than low temperature averaguig obtained by crystallography, 

binding interactions wiU reflect a range of such local minima. 

In Figures 5 A and 5B, solutions : obtained with co-crystals of elastase 
inhibitors are compared with data obtained by the methods herein described. 
In Figure 5 A, the solutions for six co-crystallized inhibitors are shown, with 

20 the inhibitor molecules overlaid on each other (non-space-filling 

representation, with the elastase segment represented by a space-filling 
illustration). These inhibitors are trifluoroacetyl-Hysyl-l-proljd- 
p-isopropylanilide (crystal solution: Mattos et al., as submitted April 30, 
1994), trifluoroacetyl-Hysyl-Heucyl-p-isopropylanilide (crystal solution: 

25 Mattos et a/., as submitted June 22, 1994), trifluoroacetyl-l-phenylalanyl- 

p-isopropylanilide (crystal solution: Mattos et aL, as submitted April 30, 
1994), trifluoroacetyl-l-phenylalanyl-1^ (crystal 
solution: Mattos et al. 9 as submitted February 14, 1995), trifluoroacetjd-1- 
valyH-alanyl-p-trifluoromethylanilide (crystal solution: Mattos et aL, as 

30 submitted February 14, 1995) and n-(tert-butoxycarbonyl-alanyl-alanjd)-o-(p- 

nitrobenzoyl) hydroxylamine (crystal solution: Ding et aL, as submitted 



July 10, 1995); la Figure 5B, the solutions for approximately 10 ORFs, which 
are in their respective high affidty pfotem are overlaid Both 

methods identify a re^on which faivors the moieties. The 

simulation process achieves approximately 90% 3D geometric identity with 
5 the crystallography results. 

Figures 6A and 6B show the regions of elastase involved in binding 
ligands as in<Kcated by the crystallographic data, Figure 6A, and as indicated 
by the solutions obtained from the computaitiohal method described herein, 
Rgure6B. 

10 The simulation of the invention utilize a M algorithm. The 

form of Monte Gailo simulation useM^ invention is described in 

Frenkel and Smit, Understanding Molecular Simulation: From Algorithms to 
Applications 9 Acie^eimc Kess, New York (1996). The simulation method can 
comprise: 

15 Locate a numeric representation of the macrbmolecule in a periodic 

cell. 

Optimize the position of the macromolecule in the celL 
Locate all the cavities in the macromolecule, whether interior or 
surface cavities. 

20 Insert and delete the ORFs (including water) in these cavities. 

Compute the probabihties of occupation of the ORFs using a grand 
canonical ensemble probability density function. 

Vary the chemical potential yielding relative free energies of binding. 
The methodology, grand-canonical ensemble simulation, can be 
25 introduced as follows: 

Grand-Canonical Ensemble Simulations 

The distinguishing feature of simulations in the grand-canonical 
ensemble is the change in the number of molecules (ORFs) in the system 
30 during the simulation. In other words, the sampling is not restricted to the 
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configuration space of a given dimension but it has to be extended to a set of 
configuration spaces. AppUcant has fpimd, unexpected 
of allowing for these changing numbers of molecules and the resulting 
changing mass nonetheless makes the simulation computationally extremely 
5 more efficient The change in the number of molecules corresponds to the fact 

^ that the grand-canonical partition function E is the linear combination of the 
coiresponding canonical partition functions of a different number, N, of 
molecules;^ 

• • AT» . iV! 

10 where f is the absolute temperature, fi is the chemical potential^ k is the 

Boltzmann constant, and Q(T,V,N) is defined by: 

Q{T 9 V,N)= q N jexp(-E(X N )/kT}lX N (2) 

with 4 being the molecular partition function. 

The sampling of the configuration space of N molecules (ORFs) has 
15 been shown to be feasible using MetropoUs Monte Carlo methods where in 

each step of the sampling a molecule's (ORFs) position is changed by a small 
; amount and the resulting new conformation is accepted or rejected based on 

the change in energy, AE, as a result of the change attempted. This position 

siiifting can be thought of as effecting a "shaking 11 of the ORF to identify its 
20 favored positioning, and the "shaking" methodology, which can be biased in 

the direction of the forces can be termed "forced bias Monte Carlo." When 

this shaking is applied, the simulation solutions reflect higher probability 

orientations. Accordingly: 

P^=min(l,exp(-A£MT)) (3) 
25 Notice that the temperature (kept constant during the simulation) enters the 

acceptance formula as a scaling factor of the energy change: 

Generalizing the canonical ensemble MetropoUs method to simulations 
in the grand-canonical ensemble calls for steps where the number of molecules 
(ORFs) changes. Operationally, this requires either the deletion of an existing 
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i^olecule or the "creation", i.e., insertion of a new one. It has been shown that 
yfhen the deleted molecule is chosen randoinly, then the deletion attempt 
shpirid be accepted with the following prbbabihty: 

P£ ^ v^^irmhT- B^ (4) 

5 where 

. B = fi'lkT+\n(N) (5) 

wth /i' being the excess chemical potential, N the number of molecules 
(ORFs), <A> its Boltzmann average and V the volume of the system (which is 
a constant during the simulation). Attempts to insert a molecule (ORF) at a 
10 random location is accepted with the following probability: 



p ace _ 



minju exp(-M: lkT+ 



Here the effect of the chemicial potential is introduced into the acceptance 
expression via the B parameter. The presence of the factors V and N follows 
fix>m the relation between the canonical and grand-canonical partition 

15 functions: when a molecule (ORF) is taken out of the system, the integration 

6Ver its coordinates (in Q N ) will yield a V factor and N is the last factor of N! 
They can also be given a probabilistic interpretation: the insertion site will be 
chosen with probability 1/V and the molecule (ORF) to be deleted will be 
chosen with probability 1/N. 

20 The simulation proceeds by alternating attempts to move, insert and 

delete molecules (ORFs) and accepting them with probabilities P"*,, P™, 
PJy*, as defined Equations (3-5) above. After sufficiently long runs; the 
' number of molecules (ORFs) N will fluctuate around its Boltzmann average 
<N>. If a given density has to be simulated then it is generally necessary to 

25 try different B values. In this regard, it is useftd to note the following 

relationship: 



(*2a)r>.(*)-(itf (7) 



This method has been found useful in simulating atomic fluids at 
moderate densities but runs into difficulties when room-temperature liquids 
are simulated. The difficulty stems from the fact that most insertion attempts 
will be at positions where there already is a molecule (e.g.* from the solved 
protein structure) resulting in a large AEl and the renting probable rejection 
of the attempt 

To increase the efficiency of insertion attbmj)ts^ a cavity-biased 
insertion technique was introduced. Insertions are attempted only at sites 
where a cavity of suitable size already exists, thereby ensuring a non^ 
negligible probability of acceptance. However, to ensure simulation 
thus modified still produces the required Bbltzmann di^butibn, both the 
insertion and deletion acceptance probabilities have to be modified. The 
modified expression involves the probability of finding a ca^dty wheii there 
are molecules (ORFs) present, P™ , which follows from: 

/^==mii^ (8) 

= min^l,exp(-A£/^-5)^| (9) 

In Equations 8 and 9, P^K represents the volume of the regions of the 
system that contain cavities of suitable size. The efficiency of the cavity- 
biased method follows from the fact that the algorithm searching for cavities 
also yields P™ without extra steps. Calculations on a variety 6f fluids (water, 
benzene), which define ORFs, have confirmed that the cavity biased method 
significantly increases the efficiency of insertion attempts and allows 
modeling of densities that proved to be impractical without this improvement. 

Water binding 

Aspects of the simulations used in the invention can be illustrated with 
calculations used to determine the strength of water binding to a synthetic 



polynucleotide (Guarnieri and Mezei, J. Am. Chem. Sac. 118: 8493-8494 
(1996)). This illustration can be described as follows: 

This text illustrates how the method of simulated annealing of 
chemical potential allows bulk Waters to be distinguished from bound waters, 
and how differentially bound waters may be distinguished from each other 
based on their relative chemical potentials. This is illustrated by showing that 
it takes more free energy to desolvate the minor groove than the major groove 
of a charged DNA dodecamer. 

Grand canonical ensemble simulations are generally performed by 
placing a molecule in a periodic simulation cell, setting a parameter B, which 
is representative of free energy, in such a way as to achieve an experimentally 
determined density, sampling potential hydration positioiis around the 
molecule by inserting and deleting water molecules from the simulation cell 
using a technique such as cavity-bias (Mezeii Mol Phys. 57:565-582 (1994); 
Resat and Mezei,/. Am. Chem. Soc. 1 1 5:7451 -7452 (1994)), and accepting or 
rejecting the attempt based on a Metropolis Monte Qarlo (Metropolis et al.,J. 
Chem. Phys. 27:1087-1092 (1953)) criteria using a grand canonical ensemble 
probability function (Tolman, R. , in 27ze Principles of Statistical Mechanics 9 
Dover Press, New York (1971)). The parameter B is related to the excess 
chemical potential //'as follows: B = p'/kT + ln<N>, where £ is Boltemann's 
constant, T is the absolute temperature, and <N> is the mean number of 
molecules of the ORF, which here is H 2 0. In the method of simulated 
annealing of chemical potential, the simulation is started with a large initial 
2?-value so that a higher percentage of water insertion attempts are accepted. 
This causes the simulation cell to be flooded with water molecules. After this 
grand canonical ensemble simulation at high excess chemical potential is 
equilibrated, subsequent simulations are carried out at successively lower 
5-values. This successive lowering of the 2?-values causes a gradual removal 
of the bulk water molecules from the simulation cell. As the chemical 
potential is further "annealed", a point is reached at which water molecules do 
not readily leave the cell, thereby identifying those water molecules that are 



strongly influenced by the DNA, the so-called "bound water molecules". As 
the excess chemical potential is again lowered^ ultimately some of these bound 
waters start to leave the cell. Since chemical potential is a free energy, this 
simulated annealing of chemical potential yields a numerical estimate of the 
differential free energy of binding of the differed bound water molecules. It 
must be emphasized that our utilization of the teim "annealing" applies strictly 
to the value of the chemical potential and that the temperaturc constant 
at, for example, 298 K in all the simulations. For all simulations the DNA was 
held fixed, water molecules were added and deleted throu^iout all parts of the 
cell; extensive canonical Monte Carlo was performed between accepted grand 
canonical Monte Carlo steps, and periodic boundary conditions were used. 

As an illustration of the method, a simulated annealing of chemical 
potential on a d(CGCGAATTCGCG)2 was performed, starting with B = 1.0 
down to -26 in 37 increments performing 2,000,000 cavity-biased grand 
canonical ensemble Monte Carlo steps at each #-value. The final 
configuration of the simulation with B = -6 , has 1120 bound water molecules. 
The final configuration of the simuliation with 5 = -8, has bound 533 water 
molecules. The final configuration of the simulation with 5 = -9, has 390 
bound water molecules. The final configuration of the simulation with B- 11, 
has .21 5 bound water molecules. The most salient feature of this progression is 
the differential hydration of the major and minor groove of the DNA. The B = 
6 simulation shows the DNA essentially uniformly solvated. The B= -8 
simulation clearly shows that upon lowering of the chemical potential by 2 
B-units, a majority of the nonbulk extracted waters come from the major 
groove, while the minor groove remains almost unaffected. Annealing the 
chemical potential further [B = -9) still leaves the minor groove well hydrated 
while the major groove is almost stripped. Lowering B even further (B - -11) 
results in the removal of almost all water molecules from both the major and 
minor groove. Quantitation of the hydration of the DNA as a function of 
chemical potential was computed by proximity analysis (Mehrotra and 
Beveridge, J. Am. Chem Soc. 702:4287 (1980); the effects of different partial 



charges on proximity analysis are described in: Mezei, Mot. Simul. /:327-332 
(1988)) with the results shown in Table 1: 
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For B = -6, the first hydration shell (defined by the position of the first 
minimum of the radial distribution -function) of the major and minor groove 
has a comparable density (0.012 and 0.013, respectively), while the second 
hydra^h ^ahi^^f^ipiitor groove has twice the density of the major grdove. 
For B ■' - -8 the hydration difference becomes quite pronounced with the minor 
groove first and second shell hydration density being 2.5 fold and 5 fold 
higher than the major groove, respectively. For B = -11 the major and minor 
groove hydration density again becomes equal because at this value of the 
excess chemical potential both grooves are essentially stripped bare. 

Illustrating the differential hydration propensities of the major and 
minor grooves of DNA is computationally undemanding (3 days of CPU time 
to run one annealing schedule and 3 days of CPU time to run one proximity 
analysis (Calculations of volume elements can be CPU intensive. The effects 
of volume element calculations on proximity analysis are described in Mezei 
and Beveridge in Methods in Enzymology, Packer, ed., Academic Press, New 
York (1986); pp. 21-47) on an SGI Power Challenge) using simulated 
annealing of chemical potential because only a coarse "cooling" schedule of 
the chemical potential is required. Since the chemical potential is a free 
energy, a very fine cooling schedule may be used to estimate quantitatively the 
hydration free energy difference of two different functional groups or even 
two different atoms of the DNA. Two atoms that desolvate at the same 
5-value have similar solvation free energy, or alternatively, require a finer 
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cooling schedule to resolve the differences. It should be noted that the model 
system used here consisted of ionic DN A with 22 negative charges and no 
sodium counterions. The findings presented herein about the preferential 
hydration of the minor groove corresponds very well to results from X-ray 
5 ciystaUographic and NMR studies. Possible reasons for the stronger binding 

of wafer molecules in the minor groove may include the following: the high 
density of the charged rows of phosphate groins, steric constraints, and 
specific water-water, water-DNA interactions. 

% where water binds tightly on a protein, are regions which 

JO are^ ^ ORF binding, thus, the remaining sites on the protein 

xihc^i^ied by water are candidates for good ORF binding. 

^ Agonists — Assays and Molecules 

Candidate bioactive agents identified by the methods of the invention 

15 can be tested to assess their binding to the macromolecule in question. Where 

the macromolecules are responsible for many biological functions, including 
disease states, it is therefore desirable to devise screening methods to identify 
compounds which stimulate or which inhibit the function of the 
macromoleculie. Accordingly, in a further aspect, the present invention 

20 provides for a method of screening compounds to identify those which 

stimulate or which inhibit the function of such a macromolecule. In general, 
agonists or antagonists can be employed for therapeutic and prophylactic 
purposes for diseases. Compounds can be identified from a variety of sources, 
for example, cells, cell-free preparations, chemical libraries, and natural 

25 product mixtures. 

The screening methods can simply measure the binding of a candidate 
compound to the macromolecule, or to cells or membranes bearing the 
macromolecule. The macromolecule can be a variant of the macromolecule 
used in the simulation method, such as a fragment retaining the binding site 

30 identified in the simulation or a fusion protein used to make recombinant 
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synthetic methods more practical. The screening method can involve 
competition with a labeled competitor. Further, these screening methods can 
test whether the candidate compound results ih a signal generated by 
activation or inhibition of the macromolecule, using detection systems 
5 appropriate to the cells comprising the macromolecule. Inhibitors of 

activation are generally assayed in the presence of a known agonist and the 
effect on activation by the agonist by the presence of the candidate compound 
is observed. Further, the screening methods can simply comprise the steps of 
mixing a candidate compound with a solution containing a macromolecule, 

10 measuring macromolecule activity in the mixture, and comparing the activity 

of the mixture to a standard. 

The invention also provides a method of screening compounds to 
identify those which enhance (agonist) or block (antagonist) the action of 
macromolecules, including association of the macromolecule with itself or 

15 another jmacromolecule. The method of screening can involve high- 

throughput techniques. For example, to screen for agonists or antagonists, a 
synthetic reaction mix* a cellular compartment, such as a membrane, cell 
envelope or cell wall, or a preparation of any thereof, comprising 
macromolecule and a labeled substrate or ligand of such polypeptide is 

20 incubated in the absence or the presence of a candidate molecule that can be 

an agonist or antagonist The ability of the candidate molecule to agonize or 
antagonize the macromolecule is reflected in decreased binding of the labeled 
ligand or decreased production of product from a substrate. Molecules that 
bind gratuitously, i.e., without inducing the effects of macromolecule are most 

25 likely to be good antagonists. Molecules that bind well and, as the case can 

be, increase for example the rate of product production from substrate, 
increase signal transduction, or increase chemical channel activity are 
agonists. Detection of the rate or level o£ as the case can be, production of 
product from substrate, signal transduction, or chemical channel activity can 

30 be enhanced by using a reporter system. Reporter systems that can be useful 

in this regard include but are not limited to colorimetric, labeled substrate 



converted into product, a reporter gene that is responsive to changes in 
macromolecule activity, and binding assays known in the art 
All publications and references, including but not limited to patents and patent 
applications, cited in this specification are herein ih(X>r^rated by reference in 
their entirety as if each individual publication or reference were specifically 
and individually indicated to be incorporated by refer^ce herein as being 
fully set forth. Any patent application to which this application claims priority 
is also incorporated by reference herein in its entirety in the manner described 
above for publications and references; 

While this invention has been described with an emphasis upon 
preferred embodiments, it will be obvious to those of ordinary skill in the art 
&at variations in the preferred devices and methods may be used and that it is 
intended that the invention may be practiced otherwise than as specifically 
described herein. Accordingly, this invention includes all modifications 
encompassed within the spirit and scope of the invention as defined by the 
claims that follow. 



WHAT IS CLAIMED IS: 



1 . A method of identifying binding sites on a macromolecule comprising: 

(a) for at least one organic fragment (ORF), conducting^ at 
separate values of parameter B, two or more simiilated annealing of chemical 
potential calculations using the ORF as the inserted solvent; and 

(b) comparing converged solutions; ffom^ 
first locations at which the relevant ORF is strongly bound, ^ 
candidate sites for binding ligand molecules. 

2. The method of claim 1, further comprising: 

(c) identifying clusters of sites that stam^ 

3. The method of claim 2, further comprising: 

(d) conducting steps (a) and (b) for each of two or more 
ORFs and identifying clusters where two or more distinct ORFs bind. 

4. The method of claim 3, wherein a cluster that binds three or more distinct 
ORFs is identified. 

5. The method of claim 3, further comprising reducing the binding stringency 
in the vicinity of a cluster to further identify elements that would 
contribute to the binding of a bioactive agent 

6. The method of claim 1, further comprising: 

(e) conducting, at separate values a measure of chemical 
potential, two or more simulated annealing of chemical potential calculations 
using water as the inserted solvent; 
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(f) comparing converged solutions from step (c) to identify 
locations at which water is strongly bound, thereby identifying water locations 
which are not candidate sites for binding ligand molecules; and . 

(g) identifying first locations that are not water locations. 
5 ■"■ ' • . • ' . 

7. The method of claim 1, wherein the simulated annealing of chemical 
potential calculations comprise multiple steps of sampling, and wherein in 
a number bf steps of the sampling the ORFs position is changed by a small 
amount and tiie resulting new position is accepted or rejected based on the 
10 change in energy as a result of the change attempted. 



8. A method of identifying the chemical characteristics of compounds that 
bind a macromolecule comprising examining the functionalities and 
relative orientations of the ORFs found in a cluster pursuant to the binding 

15 site identifying method of claim 3. 

9. A method of conducting combinatorial chemistry to identify compounds 
that interact with a macromolecule comprising: 

(a) identifying classes of reactants that are modeled by the 
20 functionalities, of the ORFs found in a cluster pursuant to the binding site 

identifying method of claim 3 ; 

(b) designing a combinatorial synthetic protocol that calls 
for two or more synthetic procedures that react reagents of at least two of the 
classes identified in step (a); and 

25 (c) conducting the combinatorial synthetic protocol to 

create candidate binding molecules. 



30 



10. A computer implemented method of analyzing a macromolecule for 
potential binding sites, comprising: 

(1) positioning an instance of a computer representation of 
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an organic fragment at a plurality of potential binding sites of the 
miacromolecule; 

(2) selecting a value; of B, wherein B ^ ^/i^^ ln<l^y 
where fi is the excess chemical potential, A: is BoltzmannV constant, T is the 

5 absolute temperature, and <N> is the mean number of molecules of the 

organic fragment; 

(3) repositioning th^e instances of the organic fragment until 
a minimized energy state is obtained; 

(4) ass^sing, for each instance of the repositioned organic 
10 fragment, whether the repositioned organic fragment binds to the 

macroniolecule at the associated potential binding site at the selected value 
oO; 

(5) deleting instances of the organic fragment that do not 
bind at the associated potential binding site at the selected value of B; 

15 (6) repeating steps (1) through (5) at a lesser value of B; 

and 

(7) outputting a list of undeleted instances of the organic 

fragment; 

provided that the organic fragment is not water. 

20 

11. The method according to claim 10, wherein the potential binding sites 
comprise an unbiased sampling of sites of the macromolecule. 

12. The method according to claim 10, wherein said positioning comprises 
25 imposing a grid oh the computer readable representation of the 

macromolecule, wherein the potential binding sites comprise the grid 
intersection points. 

13. The method according to claim 12, wherein the spacing of the grid is less 
30 than the smallest cross-section of the organic fragment 
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14. The method according to claim 12, wherein the potential binding sites 
further comprise points within a sphere shape centered at a grid point and 
having a diameter of about 10% of the grid spacing. 

5 15. The method according to claim 12, wherein the potential binding sites 

further;^ a rectangular solid shape centered at a grid 

point arid living dimensions of about 1 0% of the grid spacing. 

16. The method according to claim 10, further comprising: 
10 (8) repeating steps (1) through (7) for one or more 

[ organic fragments. 



17. The method according to claim 10, further comprising; 

(8) repeating steps (1) through (6) wherein the organic 
15 fragment is a water molecule, wherein step (7) further comprises outputting a 

list of undeleted instances of the organic fragment that are not associated with 
undeleted instances of the water molecule. 

18 . The method according to claim 16, fixrther comprising: 
20 (9) repeating steps (1) through (6) wherein the organic 

fragment is a water molecule, wherein step (7) further comprises outputting a 
list of undeleted instances of the organic fragments that are not associated with 
undeleted instances of the water molecule. 

25 19. The method according to claim 10, further comprising: 

(8) determining potential binding sites from the list of 
undeleted instances of the organic fragment. 
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20. The method according to claim 19, further comprising: • 

(9) selecting compounds exhibiting a functionality and 
relative orientation of the undeleted instances of the organic fragment; and 



(10) conducting binding or functional assays to identify the 
selected confounds that bind or affect the function of the macrpmolecule. 



21. Tie method according to claim 1 6, further comprising: 

(9) detennining potential binding sit^ ifrom the list of 



22, The method according to claim 21, further compris^ 

selecting compounds exhibiting functionalities and 

s; and 

(11) conducting binding or functional assays to identify the 
selected compounds that bind or affect the function of the macromolecule. 

23 : The method according to claim 10, wherein step (3) comprises 
performing a grand-canonical Monte Carlo simulation- 
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